Compiling French-Japanese Terminologies from the Web

نویسندگان

  • Xavier Robitaille
  • Yasuhiro Sasaki
  • Masatsugu Tonoike
  • Satoshi Sato
  • Takehito Utsuro
چکیده

We propose a method for compiling bilingual terminologies of multi-word terms (MWTs) for given translation pairs of seed terms. Traditional methods for bilingual terminology compilation exploit parallel texts, while the more recent ones have focused on comparable corpora. We use bilingual corpora collected from the web and tailor made for the seed terms. For each language, we extract from the corpus a set of MWTs pertaining to the seed’s semantic domain, and use a compositional method to align MWTs from both sets. We increase the coverage of our system by using thesauri and by applying a bootstrap method. Experimental results show high precision and indicate promising prospects for future developments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving information retrieval with multiple health terminologies in a quality-controlled gateway

BACKGROUND The Catalog and Index of French-language Health Internet resources (CISMeF) is a quality-controlled health gateway, primarily for Web resources in French (n=89,751). Recently, we achieved a major improvement in the structure of the catalogue by setting-up multiple terminologies, based on twelve health terminologies available in French, to overcome the potential weakness of the MeSH t...

متن کامل

Multiple Terminologies in a Health Portal: Automatic Indexing and Information Retrieval

Background: In the speci c context of developing qualitycontrolled health gateways, several standards must be respected (e.g. Dublin Core for metadata element set; thesaurus MeSH as the controlled vocabulary to index Internet resources; HON code to accredit quality of health Web sites). These standards were applied to create the CISMeF Web site (French acronym for Catalog & Index of Health Inte...

متن کامل

Terminology-driven Augmentation of Bilingual Terminologies

This paper proposes a way of augmenting bilingual terminologies by using a “generate and validate” method. Using existing bilingual terminologies, the method generates “potential” bilingual multi-word term pairs and validates their status by searching web documents to check whether such terms actually exist in each language. Unlike most existing bilingual term extraction methods, which use para...

متن کامل

MuEVo, a Breast Cancer Consumer Health Vocabulary Built Out of Web Forums

Semantically analyze patient-generated text from a biomedical perspective is challenging because of the vocabulary gap between patients and health professionals. The medical expertise and vocabulary is well formalized in standards terminologies and ontologies, which enable semantic analysis of expertgenerated text; however resources which formalize the vocabulary of health consumers (patients a...

متن کامل

BIOTEX: A system for Biomedical Terminology Extraction, Ranking, and Validation

Term extraction is an essential task in domain knowledge acquisition. Although hundreds of terminologies and ontologies exist in the biomedical domain, the language evolves faster than our ability to formalize and catalog it. We may be interested in the terms and words explicitly used in our corpus in order to index or mine this corpus or just to enrich currently available terminologies and ont...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006